Joint Audio-Visual Unit Selection – the JAVUS Speech Synthesizer
نویسنده
چکیده
The author presents a system for speech synthesis that selects and concatenates speech segments (units) of various size from an adequately prepared audio-visual speech database. The audio and the video track of selected segments are used together in concatenation to preserve audio-visual correlations. The input text is converted into a target phone chain and the database is searched for appropriate segments representing sub-chains of at least two phones that can be concatenated to the target utterance. The final segment sequence is selected from the possible segment sequences by a weighted sum of concatenation criteria for the audio and the video join. The weights of these audio and video join costs can be used to trade off between fluency in the audio and the video channel of the synthesized speech. The output shows the input text audio-visually spoken where the audio and the video track are reasonably fluent, synchronous, and intelligible.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملIntroducing visual target cost within an acoustic-visual unit-selection speech synthesizer
In this paper, we present a method to take into account visual information during the selection process in an acoustic-visual synthesizer. The acoustic-visual speech synthesizer is based on the selection and concatenation of synchronous bimodal diphone units i.e., speech signal and 3D facial movements of the speaker’s face. The visual speech information is acquired using a stereovision techniqu...
متن کاملAudio-Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads
This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mo...
متن کاملA hidden Markov model based visual speech synthesizer
This paper describes a hidden Markov model (HMM) based visual synthesizer designed to assist persons with impairedhearing. This synthesizer builds on results in the area of audio-visual speech recognition. We describe how a correlation HMM can be used to integrate independent acoustic and visual HMMs for speech-to-visual synthesis. Our results show that an HMM correlating model can signi cantly...
متن کاملUnit Size in Unit Selection Speech Synthesis
In this paper, we address the issue of choice of unit size in unit selection speech synthesis. We discuss the development of a Hindi speech synthesizer and our experiments with different choices of units: syllable, diphone, phone and half phone. Perceptual tests conducted to evaluate the quality of the synthesizers with different unit size indicate that the syllable synthesizer performs better ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006